28 research outputs found

    DH-PTAM: A Deep Hybrid Stereo Events-Frames Parallel Tracking And Mapping System

    Full text link
    This paper presents a robust approach for a visual parallel tracking and mapping (PTAM) system that excels in challenging environments. Our proposed method combines the strengths of heterogeneous multi-modal visual sensors, including stereo event-based and frame-based sensors, in a unified reference frame through a novel spatio-temporal synchronization of stereo visual frames and stereo event streams. We employ deep learning-based feature extraction and description for estimation to enhance robustness further. We also introduce an end-to-end parallel tracking and mapping optimization layer complemented by a simple loop-closure algorithm for efficient SLAM behavior. Through comprehensive experiments on both small-scale and large-scale real-world sequences of VECtor and TUM-VIE benchmarks, our proposed method (DH-PTAM) demonstrates superior performance compared to state-of-the-art methods in terms of robustness and accuracy in adverse conditions. Our implementation's research-based Python API is publicly available on GitHub for further research and development: https://github.com/AbanobSoliman/DH-PTAM.Comment: Submitted for publication in IEEE RA-

    PHROG: A Multimodal Feature for Place Recognition

    Get PDF
    International audienceLong-term place recognition in outdoor environments remains a challenge due to high appearance changes in the environment. The problem becomes even more difficult when the matching between two scenes has to be made with information coming from different visual sources, particularly with different spectral ranges. For instance, an infrared camera is helpful for night vision in combination with a visible camera. In this paper, we emphasize our work on testing usual feature point extractors under both constraints: repeatability across spectral ranges and long-term appearance. We develop a new feature extraction method dedicated to improve the repeatability across spectral ranges. We conduct an evaluation of feature robustness on long-term datasets coming from different imaging sources (optics, sensors size and spectral ranges) with a Bag-of-Words approach. The tests we perform demonstrate that our method brings a significant improvement on the image retrieval issue in a visual place recognition context, particularly when there is a need to associate images from various spectral ranges such as infrared and visible: we have evaluated our approach using visible, Near InfraRed (NIR), Short Wavelength InfraRed (SWIR) and Long Wavelength InfraRed (LWIR)

    SWIR Camera-Based Localization and Mapping in Challenging Environments

    Get PDF
    International audienceThis paper assesses a monocular localization system for complex scenes. The system is carried by a moving agent in a complex environment (smoke, darkness, indoor-outdoor transitions). We show howusing a short-wave infrared camera (SWIR) with a potential lightingsource is a good compromise that allows to make just a slight adaptationof classical simultaneous localization and mapping (SLAM) techniques.This choice made it possible to obtain relevant features from SWIR images and also to limit tracking failures due to the lack of key points insuch challenging environments. In addition, we propose a tracking failure recovery strategy in order to allow tracking re-initialization with orwithout the use of other sensors. Our localization system is validatedusing real datasets generated from a moving SWIR-camera in indoor environment. Obtained results are promising, and lead us to consider theintegration of our mono-SLAM in a complete localization chain includinga data fusion process from several sensors

    Real-Time Multi-SLAM System for Agent Localization and 3D Mapping in Dynamic Scenarios

    Get PDF
    International audienceThis paper introduces a Wearable SLAM system that performs indoor and outdoor SLAM in real time. The related project is part of the MALIN challenge which aims at creating a system to track emergency response agents in complex scenarios (such as dark environments, smoked rooms, repetitive patterns, building floor transitions and doorway crossing problems), where GPS technology is insufficient or inoperative. The proposed system fuses different SLAM technologies to compensate the lack of robustness of each, while estimating the pose individually. LiDAR and visual SLAM are fused with an inertial sensor in such a way that the system is able to maintain GPS coordinates that are sent via radio to a ground station, for real-time tracking. More specifically, LiDAR and monocular vision technologies are tested in dynamic scenarios where the main advantages of each have been evaluated and compared. Finally, 3D reconstruction up to three levels of details is performed

    Localisation visuelle multimodale visible/infrarouge pour la navigation autonome

    No full text
    Autonomous navigation field gathers the set of algorithms which automate the moves of a mobile robot. The case study of this thesis focuses on the outdoor localisation issue with additionnal constraints : the use of visual sensors only with variable specifications (geometry, modality, etc) and long-term apparence changes of the surrounding environment. Both types of constraints are still rarely studied in the state of the art. Our main contribution concerns the description and compression steps of the data extracted from images. We developped a method called PHROG which represents data as a visual-words histogram. Obtained results on several images datasets show an improvment of the scenes recognition performance compared to methods from the state of the art. In a context of navigation, acquired images are sequential such that we can envision a filtering method to avoid faulty localisation estimation. Two probabilistic filtering approaches are proposed : a first one defines a simple movement model with a histograms filter and a second one sets up a more complex model using visual odometry and a particules filter.On regroupe sous l’expression navigation autonome l’ensemble des méthodes visantà automatiser les déplacements d’un robot mobile. Les travaux présentés seconcentrent sur la problématique de la localisation en milieu extérieur, urbain etpériurbain, et approchent la problématique de la localisation visuelle soumise à lafois à un changement de capteurs (géométrie et modalité) ainsi qu’aux changementsde l’environnement à long terme, contraintes combinées encore très peu étudiéesdans l’état de l’art. Les recherches menées dans le cadre de cette thèse ont porté surl’utilisation exclusive de capteurs de vision. La contribution majeure de cette thèseporte sur la phase de description et compression des données issues des images sousla forme d’un histogramme de mots visuels que nous avons nommée PHROG (PluralHistograms of Restricted Oriented Gradients). Les expériences menées ont été réaliséessur plusieurs bases d’images avec différentes modalités visibles et infrarouges. Lesrésultats obtenus démontrent une amélioration des performances de reconnaissance descènes comparés aux méthodes de l’état de l’art. Par la suite, nous nous intéresseronsà la nature séquentielle des images acquises dans un contexte de navigation afin defiltrer et supprimer des estimations de localisation aberrantes. Les concepts d’un cadreprobabiliste Bayésien permettent deux applications de filtrage probabiliste appliquéesà notre problématique : une première solution définit un modèle de déplacementsimple du robot avec un filtre d’histogrammes et la deuxième met en place un modèleplus évolué faisant appel à l’odométrie visuelle au sein d’un filtre particulaire.12

    Multimodal visible/infrared visual localisation for autonomous navigation

    No full text
    On regroupe sous l’expression navigation autonome l’ensemble des méthodes visantà automatiser les déplacements d’un robot mobile. Les travaux présentés seconcentrent sur la problématique de la localisation en milieu extérieur, urbain etpériurbain, et approchent la problématique de la localisation visuelle soumise à lafois à un changement de capteurs (géométrie et modalité) ainsi qu’aux changementsde l’environnement à long terme, contraintes combinées encore très peu étudiéesdans l’état de l’art. Les recherches menées dans le cadre de cette thèse ont porté surl’utilisation exclusive de capteurs de vision. La contribution majeure de cette thèseporte sur la phase de description et compression des données issues des images sousla forme d’un histogramme de mots visuels que nous avons nommée PHROG (PluralHistograms of Restricted Oriented Gradients). Les expériences menées ont été réaliséessur plusieurs bases d’images avec différentes modalités visibles et infrarouges. Lesrésultats obtenus démontrent une amélioration des performances de reconnaissance descènes comparés aux méthodes de l’état de l’art. Par la suite, nous nous intéresseronsà la nature séquentielle des images acquises dans un contexte de navigation afin defiltrer et supprimer des estimations de localisation aberrantes. Les concepts d’un cadreprobabiliste Bayésien permettent deux applications de filtrage probabiliste appliquéesà notre problématique : une première solution définit un modèle de déplacementsimple du robot avec un filtre d’histogrammes et la deuxième met en place un modèleplus évolué faisant appel à l’odométrie visuelle au sein d’un filtre particulaire.123Autonomous navigation field gathers the set of algorithms which automate the moves of a mobile robot. The case study of this thesis focuses on the outdoor localisation issue with additionnal constraints : the use of visual sensors only with variable specifications (geometry, modality, etc) and long-term apparence changes of the surrounding environment. Both types of constraints are still rarely studied in the state of the art. Our main contribution concerns the description and compression steps of the data extracted from images. We developped a method called PHROG which represents data as a visual-words histogram. Obtained results on several images datasets show an improvment of the scenes recognition performance compared to methods from the state of the art. In a context of navigation, acquired images are sequential such that we can envision a filtering method to avoid faulty localisation estimation. Two probabilistic filtering approaches are proposed : a first one defines a simple movement model with a histograms filter and a second one sets up a more complex model using visual odometry and a particules filter

    Localisation visuelle multimodale visible/infrarouge pour la navigation autonome

    Get PDF
    Autonomous navigation field gathers the set of algorithms which automate the moves of a mobile robot. The case study of this thesis focuses on the outdoor localisation issue with additionnal constraints : the use of visual sensors only with variable specifications (geometry, modality, etc) and long-term apparence changes of the surrounding environment. Both types of constraints are still rarely studied in the state of the art. Our main contribution concerns the description and compression steps of the data extracted from images. We developped a method called PHROG which represents data as a visual-words histogram. Obtained results on several images datasets show an improvment of the scenes recognition performance compared to methods from the state of the art. In a context of navigation, acquired images are sequential such that we can envision a filtering method to avoid faulty localisation estimation. Two probabilistic filtering approaches are proposed : a first one defines a simple movement model with a histograms filter and a second one sets up a more complex model using visual odometry and a particules filter.On regroupe sous l’expression navigation autonome l’ensemble des méthodes visantà automatiser les déplacements d’un robot mobile. Les travaux présentés seconcentrent sur la problématique de la localisation en milieu extérieur, urbain etpériurbain, et approchent la problématique de la localisation visuelle soumise à lafois à un changement de capteurs (géométrie et modalité) ainsi qu’aux changementsde l’environnement à long terme, contraintes combinées encore très peu étudiéesdans l’état de l’art. Les recherches menées dans le cadre de cette thèse ont porté surl’utilisation exclusive de capteurs de vision. La contribution majeure de cette thèseporte sur la phase de description et compression des données issues des images sousla forme d’un histogramme de mots visuels que nous avons nommée PHROG (PluralHistograms of Restricted Oriented Gradients). Les expériences menées ont été réaliséessur plusieurs bases d’images avec différentes modalités visibles et infrarouges. Lesrésultats obtenus démontrent une amélioration des performances de reconnaissance descènes comparés aux méthodes de l’état de l’art. Par la suite, nous nous intéresseronsà la nature séquentielle des images acquises dans un contexte de navigation afin defiltrer et supprimer des estimations de localisation aberrantes. Les concepts d’un cadreprobabiliste Bayésien permettent deux applications de filtrage probabiliste appliquéesà notre problématique : une première solution définit un modèle de déplacementsimple du robot avec un filtre d’histogrammes et la deuxième met en place un modèleplus évolué faisant appel à l’odométrie visuelle au sein d’un filtre particulaire.12

    IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments

    No full text
    International audienceThe development process of high fidelity SLAM systems depends on their validation upon reliable datasets. Towards this goal, we propose IBISCape, a simulated benchmark that includes data synchronization and acquisition APIs for telemetry from heterogeneous sensors: stereo-RGB/DVS, LiDAR, IMU, and GPS, along with the ground truth scene segmentation, depth maps and vehicle ego-motion. Our benchmark is built upon the CARLA simulator, whose back-end is the Unreal Engine rendering a high dynamic scenery simulating the real world. Moreover, we offer 43 datasets for Autonomous Ground Vehicles (AGVs) reliability assessment, including scenarios for scene understanding evaluation like accidents, along with a wide range of frame quality based on a dynamic weather simulation class integrated with our APIs. We also introduce the first calibration targets to CARLA maps to solve the unknown distortion parameters problem of CARLA simulated DVS and RGB cameras. Furthermore, we propose a novel pre-processing layer that eases the integration of DVS sensor events in any frame-based Visual-SLAM system. Finally, extensive qualitative and quantitative evaluations of the latest state-of-the-art Visual/Visual-Inertial/LiDAR SLAM systems are performed on various IBISCape sequences collected in simulated large-scale dynamic environments

    A General Two-Branch Decoder Architecture for Improving Encoder-Decoder Image Segmentation Models

    No full text
    International audienceRecently, many methods with complex structures were proposed to address image parsing tasks such as image segmentation. These well-designed structures are hardly to be used flexibly and require a heavy footprint. This paper focuses on a popular semantic segmentation framework known as encoder-decoder, and points out a phenomenon that existing decoders do not fully integrate the information extracted by the encoder. To alleviate this issue, we propose a more general two-branch paradigm, composed of a main branch and an auxiliary branch, without increasing the number of parameters, and a boundary enhanced loss computation strategy to make two-branch decoders learn complementary information adaptively instead of explicitly indicating the specific learning element. In addition, one branch learns pixels that are difficult to resolve in another branch making a competition between them, which promotes the model to learn more efficiently. We evaluate our approach on two challenging image segmentation datasets and show its superior performance in different baseline models. We also perform an ablation study to tease apart the effects of different settings. Finally, we show our two-branch paradigm can achieve satisfactory results when remove the auxiliary branch in the inference stage, so that it can be applied to low-resource systems

    Multi-modal unsupervised domain adaptation for semantic image segmentation

    No full text
    International audienceWe propose a novel multi-modal-based Unsupervised Domain Adaptation (UDA) method for semantic segmentation. Recently, depth has proven to be a relevent property for providing geometric cues to enhance the RGB representation. However, existing UDA methods solely process RGB images or additionally cultivate depth-awareness with an auxiliary depth estimation task. We argue that geometric cues that are crucial to semantic segmentation, such as local shape and relative position, are challenging to recover from an auxiliary depth estimation task with mere color (RGB) information. In this paper, we propose a novel multi-modal UDA method named MMADT, which relies on both RGB and depth images as input. In particular, we design a Depth Fusion Block (DFB) to recalibrate depth information and leverage Depth Adversarial Training (DAT) to bridge the depth discrepancy between the source and target domain. Besides, we propose a self-supervised multi-modal depth estimation assistant network named Geo-Assistant (GA) to align the feature space of RGB and depth and shape the sensitivity of our MMADT to depth information. We experimentally observed significant performance improvement in multiple synthetic to real adaptation benchmarks, i.e., SYNTHIA-to-Cityscapes, GTA5-to-Cityscapes and SELMA-to-Cityscapes. Additionally, our multi-modal UDA scheme is easy to port to other UDA methods with a consistent performance boost
    corecore